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Abstract —We consider a network of agents that aim to learn 
some unknown state of the world using private ohservations 
and exchange of heliefs. At each time, agents observe private 
signals generated based on the true unknown state. Each agent 
might not be able to distinguish the true state based only 
on her private observations. This occurs when some other 
states are observationally equivalent to the true state from the 
agent’s perspective. To overcome this shortcoming, agents must 
communicate with each other to benefit from local observations. 
We propose a model where each agent selects one of her neighbors 
randomly at each time. Then, she refines her opinion using her 
private signal and the prior of that particular neighbor. The 
proposed rule can be thought of as a Bayesian agent who cannot 
recall the priors based on which other agents make Inferences. 
This learning without recall approach preserves some aspects 
of the Bayesian inference while being computationally tractable. 
By establishing a correspondence with a random walk on the 
network graph, we prove that under the described protocol, 
agents learn the truth exponentially fast in the almost sure sense. 
The asymptotic rate is expressed as the sum of the relative 
entropies between the signal structures of every agent weighted 
by the stationary distribution of the random walk. 

I. Introduction & Background 

Distributed estimation and detection problems have been 
interesting subject of study in a variety of disciplines, rang¬ 
ing from control theory to statistics, economics, and signal 
processing ni-ii. In the distributed detection problem, each 
agent observes a sequence of independent and identically 
distributed (i.i.d.) private signals generated according to the 
true (unknown) state. Suppose that each agent forms a belief 
about the true state, represented by a discrete probability 
distribution over a finite state space, and sequentially performs 
the Bayes’ rule to her observations at each step. It is well- 
known a, cni that the beliefs formed in the above manner 
constitute a bounded martingale and converge to a limiting 
distribution as the number of observations tends to infinity. 
However, the limiting distribution is not necessarily concen¬ 
trated on the truth, in which case the agent fails to learn 
the true state asymptotically. In fact, in many scenarios, the 
agent faces an identification problem where there are states 
(other than the true state) that are observationally equivalent 
to the true state. In other words, these states induce the same 
distribution on her sequence of privately observed signals. 
Therefore, rational agents communicate in a social network 
to distinguish the truth by relying on local observations. This 
leads to the problem of social learning that is a classical focus 
of behavioral microeconomic theory d, HI, also studied 
in the context of distributed estimation and statistical learning 
theory fH, ifTSll . 
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On the other hand, sequentially applying Bayes’ rule in 
networks can become computationally intractable since the 
global network structure is not available to individuals. This 
origins from the fact that agents should use their local data 
that is increasing with time, and infer about the global signal 
structure. Therefore, the analysis of rational behavior in net¬ 
works is an important problem in Bayesian economics, and 
has attracted a considerable attention d, iia. On the other 
side of the spectrum lie the works such as ca-iiii which 
aim to study the problem of learning in networks via iterative 
applications of non-Bayesian rules. These updates provide the 
asymptotic properties of learning and consensus under certain 
conditions. More recently, some works (e.g. see EqI, ED) 
have also provided the non-asymptotic analysis of the problem. 

In this paper, we study a distributed learning model where 
each agent observes a sequence of independent and identically 
distributed private signals. The structure of the network (which 
we assume to be strongly connected) is preset in the sense that 
all agents know their local neighborhood before the learning 
process. However, they do not necessarily contact all their 
neighbors every time. At every epoch of time, each agent 
randomly selects one neighbor, and uses her neighbor’s prior 
(rather than herself) in the form of the Bayes’ update together 
with her private signal at that instant of time. This can be 
seen as a learning without recall rule where agents randomly 
pick their priors from their local neighborhood. Intuitively, 
asymptotic learning occurs since each agent performs a ran¬ 
dom walk over a strongly connected graph and picks up 
the privately observed signals of the nodes as they are hit 
by the random walk. We show that the learning rate for 
such an agent is exponentially fast with an asymptotic rate 
that can be expressed as the weighted sum of the relative 
entropies between the likelihood structures of each agent 
under various states of the world, and the weights are the 
their probabilities in the stationary distribution of the random 
walk. In many distributed learning models over random and 
switching networks, agents must have positive self-reliant at 
any time. One can observe this condition, for instance, in 
gossip algorithms f22\ and ergodic stationary processes EH- 
An interesting and subtle point in our communication structure 
is the relaxation of this condition, as our agents rely entirely 
on the beliefs of their neighbors every time that they select 
a neighbor to gossip with. Moreover, unlike the majority of 
results that rely on the convergence properties of products of 
stochastic matrices and are applicable only to irreducible and 
aperiodic communication matrices, cf. Proporition 1]; our 
results do not require the transition probability matrix to be 
aperiodic. This is because our proof of convergence relies on 
the ergodic theorem for the almost-sure convergence of the 
long-run fraction of time that is spent in any state of a Markov 
chain; and it holds true for any irreducible, positive-recurrent 


chain, and in particular any irreducible, finite-state chain ll25l 
Theorem 1.5.6]. It is further true that such a chain has a unique 
stationary distribution ll25l Theorem 1.7.7], which we use to 
characterize the almost-sure exponentially fast asymptotic rate 
of convergence under our proposed distributed learning model. 

The remainder of this paper is organized as follows. The 
modeling and formulation are set forth in Section [III where 
we present the signal and belief structures and their evolution. 
We end section |II] by a description of learning without recall 
updates in sparse structures where the neighborhood of each 
agent has at most one node. Next in Section [HI] we show 
how the preceding updates can be used even when the agents’ 
neighborhood are not singletons. This achieved by imple¬ 
menting a gossip-like procedure where a single neighbors is 
chosen randomly at every time-step and communications are 
performed with only one neighbor at a time. We study the 
properties of convergence and learning under this procedure 
and show a correspondence with random walks on directed 
graphs that simplifies our analysis. An illustration is provided 
at the end of Section Hn] and the paper is concluded by 
Section IIVI 


II. The Model 

Notation: Throughout the paper, R is the set of real 
numbers, N denotes the set of all natural numbers, and 
W = N U {0}. For n € N a fixed integer the set of 
integers {1, 2,..., n} is denoted by [n], while any other set is 
represented by a calligraphic capital letter. The cardinality of 
a set X, which is the number of its elements, is denoted by 
I X |, and ^{X) = {Ad; Ad C X} denotes the power-set of 

X, which is the set of all its subsets. The difference of two 
sets X and y is defined by X\y := {x',x € X and x ^ 3^}. 
Boldface letters denote random variables. 

We consider a network of n agents that interact according 
to a directed graph Q — ([n], f), where £ <z[n\x \n] is the set 
of directed edges. Each agent is labeled with an element of 
the set [n]. M{i) = {j S [n]; (j, f) € is the neighborhood 
of agent i which is the set of all agents whose beliefs can be 
observed by agent i. We let deg(i) =| N{i) \ be the degree 
of node i corresponding to the number of agent I’s neighbors. 

The Environment: We denote by 0 the set of states of the 
world which has a finite cardinality. Also, A0 represents the 
space of all probability measures on the set 0. Each agent’s 
goal is to decide amongst the finitely many possibilities in the 
state space 0. A random variable 6 is chosen randomly from 0 
by the nature and according to the probability measure v{-) € 

AO, which satisfies v{9) > O,V0 S 0 and is referred to as 
the common prior. Eor each agent i, there exists a finite signal 
space denoted by Si, and given 6, ii{- | 0) is a probability 
measure on Si, which is referred to as the signal structure 
or likelihood function of agent i. Eurthermore, (11, ,^,P) is a 
probability triplet, where 



is an infinite product space with a general element ui = 

and the associated 

sigma field = ,^(11). P(-) is the probability measure on H 
which assigns probabilities consistently with the common prior 
v{-) and the likelihood functions li{- \ 6),i & [n]. Conditioned 
on 9, the random vectors {(si t,... ,s„ t),f G W} are inde¬ 
pendent. E{-} is the expectation operator, which represents 
integration with respect to dP(u:). 

Signals: Let t G W denote the time index and for each 
agent i, define js^ G W} to be a sequence of independent 
and identically distributed random variables with the proba¬ 
bility mass function ii{- \ 0); this sequence represents the 
private observations made by agent i at each time period t. 
The privately observed signals are independent and identically 
distributed over time, but they could be correlated across the 
agents. 

Beliefs: We let Hnf) represent the opinion or belief at 
time t of agent i about the realized value of 0. In other words, 
is a probability distribution on the set 0 at any time 
t formed by agent i. Note the randomness of Hai') due to 
its dependence on the random observations of the agent. The 
goal is to study asymptotic learning, i.e. for each agent to 
learn the true realized value 0 C 0 of 0 asymptotically. This 
amounts to having Puf) converge to a point mass centered 
at 9, where the convergence could be in probability or in the 
stronger almost sure sense that we use in this work. 

At f = 0 the value 0 = 0 is selected by nature. Eollowed by 
that. Si 0 for each i G [n] is realized and observed by agent i. 
Then the agent forms an initial Bayesian opinion q( ) about 
the value of 0. Given s^ o, and using the Bayes’ rule for each 
agent i G\n], the initial belief in terms of the observed signal 
Si,o is given by; 

i^(0)£i(s,,o I 0) 

Mi Ov^/ — X—^ ~ - ■ 

^ v{9)ei{s^fi \ 9) 
eee 

Afterwards, at any time t each agent i observes the realized 
value of Si^t as well as the current belief of one of her 
neighbors /x^. t_i(-), where k is selected randomly from Af(j). 
She then forms a refined opinion Puf) by incorporating all 
the data that have been made available to her by the time t. 
We elaborate on the update rule in the following. 


III. Combined Gossip and without Recall Updates; 
Signals Picked up in a Random Walk 


Consider a digraph Q satisfying deg(z) G (0, l},Vx G [n]. 
For this class of networks which include directed circles and 
rooted trees in na, Ezi, the authors propose to use the 
Bayesian update 




I wfl ^ o 
;- —,y9 G 0 

^Mi,t_l(0)fi(s*,t I 0) 


U = 0 X 


( 1 ) 




if deg(i) = 0; and else to use 




I 0) 


,ve e 0, 


See 
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where j € [n] is the unique vertex j G M{i)- These updates 
are a special case of the Learning without Recall rules that 
are developed in a companion paper, and they can describe 
the behavior of Rational but Memoryless agents who share 
a common prior ^(•) and always interpret their current and 
observed beliefs as having stemmed from this common prior, 
thus ignoring their entire history of past observations. 


Here we propose the application of the Learning without 
Recall updates that we described in the previous section to 
general networks, by requiring that at every time step t, node 
i make a random choice from her set of neighbors Af(i) 
and uses that choice for the unique j in (|2|i. To this end, 
let (Tf G njg[„]A/’(i), f € N be a sequence of independent 
and identically distributed random vectors such that Vf G N, 
(Tt^i G M{i) is that neighbor of i which she chooses to 
communicate with at time t. Hence, for all t and any i, 0 
becomes 


= —^,V0e0. (3) 


I 9) 


See 


To proceed, annex the random choice of neighbors for every 
node i G [n] and all times f S N to the original probability 
space (H, P) specified in SectionlnJ and for f G N arbitrary, 
let F{crt^i = j} = pij > 0. Wherefore, 

Pi,i < 1, and pij = 0 whenever j ^ J^{i) U {i}. Let P be 
the row stochastic matrix whose {i, j)-th entry is equal to pij. 
Let = 1 if crt,i = j and l[crti=j} — 0 otherwise. 

Then 0 can be written as 


j=i 2-^ I 9) 


See 
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To analyze the propagation of beliefs under 0 we form the 
belief ratio 

^ I 9) A ( l^j,t-i{9) \ 

I 9) t_i(6») j 

for any false state 9 G 0\{0} and each agent i G [n] at 
all times f G N. The above has the advantage of removing 
the normalization factor in the dominator out of the picture; 
thence, focusing instead on the evolution of belief ratios. To 


proceed, we take the logarithms of both sides in 0 to obtain 



== log 


1 

|0)y 
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Next we can iterate 0 to replace for {pij ^_-^{9)/pij ^_^{9)) 
and so on, from which we get 0 at the top of next page. Also 
note. 


• ■ • yz — 1 ) 

*1 = 1 it = l 

almost surely, and in fact every where on LI, so that the 
initial prior belief ratio \og{iy{9)/1'{9)) always appears in the 
summation 0, and it simplifies as in 0 at the top of next 
page. 

We now claim that whenever t oo and the network graph 
Q is strongly connected, with P-probability one the likelihood 
ratios of private signals from any node m G [n] appears in the 
summation 0 as £m{sm.,t-T \ 9)/im,{sm,t-T \ 9) for infinitely 
many values of r. The gist of the proof is in realizing the 
correspondence between the summation 0 and a random walk 
on the directed graph Q that starts at time t on node i, proceeds 
in the reversed time direction, and terminates at time zero. 
The jumps in this random walk are made from each node i 
to one of her in-neighbors j G Af(i) and in accordance with 
the probabilities pij specified by matrix P — [pij]. Indeed, 
we can denote the random sequence of nodes that are hit by 
this random walk as {i, ii,..., it) where the random variables 
ir G [n], T G [f] are defined recursively by ii := crt,i, h ■= 
crt-i,cTt , is := crf- 2 ,crt_i ■= (Ti,rT 2 ■ Whence 

0 is written succinctly as 



= log 


A*(Si,t I 
AiA.t | 0 )y 



t 

+y^iog 

r — l 


f e,M.,t-r\9) \ 

(Six.t-T I 9) J 


( 8 ) 


As t ^ oo, the sequence 1 t,t G N forms a Markov process 
with transition matrix P. Given 0, our claim can be restated 
as that for every m G [n] and as f —oo there are infinitely 
many values of r G N for which = m, and it is true because 
in a finite state Markov chain with transition matrix P every 
state is persistent (recurrent) and will be hit infinitely many 
times provided that the directed graph Q is strongly connected 
ll25l Theorem 1.5.6], i.e. we have that Vm G [n]. 


P{iT- = TO, for infinitely many r} = 1. 


For any agent to G [n] let 77 := {Tm.jij G N} be the 
sequence of stopping times that record the first, second and 
so on passage times of node to by the process i,-,T G N. 
That is we have Tm,i = inljr G N : ir = to} and for j > 1, 
Tm,j = inf{T > Tm,j-i '■ P = TO.}. Using the above notation. 
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(l8]l can be rewritten as 


log 




= log 
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log 
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m—l r^Tm 
r<t 


^mi^'m,t — T I ^) 


log 


^■m{pm,t — T I ^) 
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On the other hand, note that 

log — ,j I ^') f ^m{^m,t — Trn,j I ^))’ J ^ M is a 

sequence of independent and identically distributed signals, 
so that by the strong of large numbers we obtain that with 
P-probability one. 


lim — log 

TJ.—^on n • ^ 


i=i 


/ ^rni^m^t—TrnJ I 
\^m{^m,t—'rm,j I / 


= Elog 


/ 4(Sm,0 I 
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■=-Dkl (4(-|0)||4(-|0)) s^o, 

( 10 ) 


where the non-positivity follows from the information in¬ 
equality for the Kullback-Leibler divergence Dkl ('II’) and 
is strict whenever £rn{'\0) ^ 4('|0), i-S- 3s G Si such that 
£i{s\9) 7 ^ £iis\0) ll28l Theorem 2.6.3]. Note that whenever 
£i{-\0) = ii{-\9) or equivalently Dkl = 0, 

then the two states 0 and 0 are statically indistinguishable 
to agent i. In other words, there is no way for agent i to 
differentiate 9 from 9 based only on her private signals. This 
follows from the fact that both 9 and 9 induce the same prob¬ 
ability distribution on her sequence of observed i.i.d. signals. 
On the other hand, having Dkl {£m{-\9)\\£m{-\9)) < 0 for 


some agent m G [n] would ensure per (fTOl i and persistence of 
state m that with P-probability one. 


E log 

r€Tm, 

T<t 


/ £m{Sm,t-r \ 9) \ 
\£m{Sm,t-r \ 9) J 


—oo 


as i —)■ cx) in (lU); consequently, \og{^l^ ^{9)/fJ,^ ^{9)) —oo 
for all agent i G [n] and any such 9 G Q, 9 ^ 9. Indeed, having 
log (/Xj ((0)//Xj ((0)) -G —oo for all 0 7 ^ 0 is necessary and 
sufficient for learning, and we therefore, have the following 
characterization. 


Definition 1 (Global Identifiability). In a strongly connected 
topology, the true state 9 is globally identifiable, if for 
all 9 9 there exists some agent m G [n] such that 

Dkl {£mm\\£^{-\9)) < 0, i.e. m can distinguish between 
9 and 9 based only on her private signals. 

We have thus established the conditions for learning under 
the without recall updates in ([T]) and (|2]), where the neighbor j 
is chosen randomly with strictly positive probabilities specified 
in transition matrix P. We dub this procedure “gossips without 
recall’ and summarize our hndings as follows: 

Theorem 1 (Almost-Sure Learning). Under the gossips with¬ 
out recall updates in a strongly connected network where 
the truth is globally identifiable, all agents learn the truth 
asymptotically almost surely. 

We can extend the above analysis to derive an asymptotic 
rate of learning for the agents that is exponentially fast 
and is expressed as YZi=i^niDKL{£7n{-\9)\\£m{-\9)) < 0, 
where if := (tti, ..., 7r„) is the stationary distribution of the 


























transition matrix P, which for a strongly connected Q is the 
unique probbaility distribution on [n] satisfying IfP — W. 
To see how, for each agent m € [n] and all time t, define 
:= {Tm,j,j S N : Tm,j < t} and divide both sides of 
(|9]l by f to obtain 


t 


log 


Pi,M 




= -log 


I 0) 


+ 7^ E log 

Upon invoking ( fTOl i we obtain 


- •£ ii„. (/„,{■ 


1, 


.(•1^))- (11) 


Finally the ergodic theorem ensures that the average time spent 
in any state m G [n] converges almost surely to its stationary 
probability tt^, i.e. with probability one limt^oo \‘Tm{t)\/t = 
TTm, ll25l Theorem 1.10.2]. Hence, (fTlTi becomes 

Mg.iW ) 


Y^^rnDKL (■ |0) ||f gn ('|0)) , 


lim — log 

t^CO t 


completing the proof for the claimed asymptotically exponen¬ 
tially fast rate. 

Example 1. Eight Agents with Binary Signals in a Tri-State 
World. 
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- 9 = 3 


Fig. 2: Evolution of the second agents beliefs over time 
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As an illustration consider the network of agents in Fig. [T] 
with the true state of the world being 1, the first of the tree 
possible states 0 = {1,2,3}. The likelihood structure for the 
first three agents is given in the table and note that none of 
them can learn the truth on their own; indeed, agent 3 does 
not receive any informative signals and her beliefs shall never 
depart from their initial priors following ([T]i. We further set 
lj{- I •) = ? 3 (- I •) for all j G [8]\[3], so that all the remaining 
agents are also unable to infer anything about the true state of 
the world from their own private signals. 



Fig. 1: Network Structure for Example 1 


Eig. 3; The difference between the third and eighth agents 
beliefs over time 
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Starting from a uniform common prior and following the 
proposed gossip without recall scheme with neighbors chosen 
uniformly at random, all agents asymptotically learn the true 
state, even though none of them can learn the true state on their 
own. The plots in Eigs. |2] and [3 depict the belief evolution for 
the second agent, as well as the difference between the beliefs 
for the third and eighth agents. It is further observable that 
all agents learn the true state at the same exponentially fast 
asymptotic rate of learning. 

IV. Concluding Remarks 

This work addressed a social and observational learning 
model in multi-agent networks. Agents attempt to learn some 





































unknown state of the world which belongs to a finite state 
space. Conditioned on the true state, a sequence of i.i.d. 
private signals are generated and observed by each agent of 
the network. The private signals do not provide each agent 
with adequate information to identify the truth. Hence, agents 
contact their neighbors to augment their imperfect observations 
with those of their neighbors. In our model, every time, each 
agent picks a neighbor randomly and updates her belief using 
the prior of that particular neighbor but using the likelihood 
for her own private signal. The communication protocol is an 
instance of a learning without recall and is implemented in 
such a way that signals likelihoods that comprise an agent’s 
belief are picked up by a random walk on the network graph. 
We proved that agents learn the truth exponentially fast and 
in the almost sure sense, provided that the network is strongly 
and the truth is globally identifiable. The asymptotic rate is 
expressed as a weighted sum of the relative entropies between 
the signal structures of each agent, where the weights come 
from the stationary distribution of the transition probability 
matrix according to which neighbors are chosen at every time 
instant. 
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